-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add client JSON storage API endpoint #1405
base: master
Are you sure you want to change the base?
Conversation
|
||
Some important notes about this feature: | ||
* Client storage is one blob per user per client. Said another way: clients are | ||
only able to store one blob per `clientid` and that blob is only for the user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouln't this be 'one blob per userid
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's both. It's one per clientid
and by the user that is authenticated. I'll change it slightly to maybe make it clearer.
api/v1_storage.inc
Outdated
|
||
$storage = new ApiClientStorage($clientid, $pguser); | ||
if ($method == "GET") { | ||
return json_decode($storage->get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should not the client decode and enode the JSON data so we are just dealing with strings here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API automatically converts returns to JSON values so if we don't decode this into an object we stringify it twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, of course, you are right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I'm still concerned about this. As I've discovered elsewhere, converting between javascript objects and arrays and php arrays and associative arrays via JSON can cause some screw-ups. Let me think about this some more.
In fact, given the existing API structure, we do want to encode it twice when returning the string to the client. The client encodes its javascript object into a JSON string and expects to get back the same string. The string is sent to the server and stored unchanged in the database and when it gets sent back it is encoded again, as you said, by the json_encode in index.php and then decoded by the response.json() in api.js. The client has to decode it again to recover the original javascript object.
So we don't want to do any encoding or decoding in v1_storage.inc.
{ | ||
global $api_client_storage_keys; | ||
|
||
if (!in_array($client, $api_client_storage_keys)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the client not already have been validated before we get here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would. I did it here too to ensure it's enforced at the data layer not just the user input layer if this is used elsewhere for some reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good overall structure, just a few comments about details.
description: JSON blob that was persisted | ||
content: | ||
application/json: | ||
schema: | ||
type: object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it make sense to return the timestamp (something the client could use for caching) rather than the JSON blob that was sent by the client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timestamp is arguably just as useless as the JSON blob because it's always now()
. I was uncertain what a good return value would be so I modeled what we do for PUTs for word lists -- and what the product at my job does for PUTs and PATCHes -- which is to return the object that was actually serialized.
Open to ideas / thoughts on what API clients should "expect" from a successful PUT / PATCH / etc beyond the HTTP status code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timestamp is arguably just as useless as the JSON blob because it's always now().
Sure as it represents the last modified timestamp from the server's perspective.
However I do think there is a lot more value to returning this compared to the blob that was sent and is known to the client. The timestamp would allow the client to implement some caching (e.g. don't check with the server if the timestamp is fresh enough), lower the bandwidth needed (by return the JSON object on a get
request if the timestamp doesn't match, useful for mobile devices) or potentially even allow for delta (this would need to switch to some event based logic). Arguably, neither of those scenarios applies here, but those are enabled by returning the timestamp 😄
Open to ideas / thoughts on what API clients should "expect" from a successful PUT / PATCH / etc beyond the HTTP status code.
I don't think we need to return anything from the API on success: the return code says if it was successful and we can return an optional error field to return the cause of errors to the client. Those 2 are what a minimal client would expect. We can always add more as we understand our needs more.
pinc/JsonStorage.inc
Outdated
*/ | ||
public function get(string $setting): ?string | ||
{ | ||
$value = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should move defining $value
where need it before the loop on l.69.
To enable this feature, add a string for the client to the | ||
`_API_CLIENT_STORAGE_KEYS` configuration setting and have the client use that | ||
string with the endpoint as the `clientid`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am having a hard time understanding the purpose of the API client storage keys. If the end goal is to allow any client to store this configuration down the road (which means that a user could trivially get one of our keys), is this some temporary measure to prevent arbitrary storage from storing JSON in our DB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have the gist of it, but it's not temporary.
Keep in mind that every user has de facto API access via PHP sessions -- that's how the Page Browser users the API and how the new proofreading UI will too. If we don't have some restriction, any logged in user could fill up our server with random JSON blobs. The clientid
means that each user can have only at most len(API_CLIENT_STORAGE_KEYS)
JSON blobs and those have a maximum size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps "client" is the wrong word for this? I am using it in the "JS client that uses the APIs". Would "app" or some other word be clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having trouble visualizing it, too. Part of the discussion has centered around not only persisting settings across browsers and devices, but allowing different browsers on different devices to have settings specific to those devices (for example, if one device has a large monitor, the user may prefer the side-by-side version of the interface, but prefer top-to-bottom on a device with a smaller monitor).
My initial impression of "per client per user" was that multiple devices was covered because each user might be using different clients, thus, each user might have multiple entries, so I'm wondering if I'm just misunderstanding the terminology.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JS client (ie: new proofreading interface) will need to figure out which settings it should store in the local browser cache (and only available to that browser -- like window size) and which one it should store on the server (and available to any browser device -- like font family). All this endpoint does is handle saving and returning whatever the JS client sends it. We have to figure out the other parts but that's all done within the JS client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The clientid means that each user can have only at most len(API_CLIENT_STORAGE_KEYS) JSON blobs and those have a maximum size.
I am not seeing any validation on the number of stored blobs, we only check that the client id is allow-listed but besides that the constraints on the DB would allow unbounded storage. Those are trusted clients so it doesn't seem to be a concern, just curious about this.
Perhaps "client" is the wrong word for this? I am using it in the "JS client that uses the APIs". Would "app" or some other word be clearer?
Mhh, is the idea to separate our browser users from other "app" that programmatically call our API?
You have the gist of it, but it's not temporary.
OK, as a side question since we need a logged in users to store the information, would it make sense to use the user rather than an extra key? Or are we thinking that some "apps" will backfill/mutate a lot of users' keys? I am probably missing something rather obvious here as I don't have much context on the new UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the questions and updates. I'll get the (excellent) code suggestions done later today.
api/v1_storage.inc
Outdated
|
||
$storage = new ApiClientStorage($clientid, $pguser); | ||
if ($method == "GET") { | ||
return json_decode($storage->get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API automatically converts returns to JSON values so if we don't decode this into an object we stringify it twice.
To enable this feature, add a string for the client to the | ||
`_API_CLIENT_STORAGE_KEYS` configuration setting and have the client use that | ||
string with the endpoint as the `clientid`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have the gist of it, but it's not temporary.
Keep in mind that every user has de facto API access via PHP sessions -- that's how the Page Browser users the API and how the new proofreading UI will too. If we don't have some restriction, any logged in user could fill up our server with random JSON blobs. The clientid
means that each user can have only at most len(API_CLIENT_STORAGE_KEYS)
JSON blobs and those have a maximum size.
description: JSON blob that was persisted | ||
content: | ||
application/json: | ||
schema: | ||
type: object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timestamp is arguably just as useless as the JSON blob because it's always now()
. I was uncertain what a good return value would be so I modeled what we do for PUTs for word lists -- and what the product at my job does for PUTs and PATCHes -- which is to return the object that was actually serialized.
Open to ideas / thoughts on what API clients should "expect" from a successful PUT / PATCH / etc beyond the HTTP status code.
{ | ||
global $api_client_storage_keys; | ||
|
||
if (!in_array($client, $api_client_storage_keys)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would. I did it here too to ensure it's enforced at the data layer not just the user input layer if this is used elsewhere for some reason.
Add a generic JSON-based storage class and an extension of it that will be used in the API to store client blobs. This is a similar structure to the Settings class / user_settings table but enforces a JSON value.
ac44218
to
974c9b6
Compare
PR feedback so far incorporated and resolved the merge conflicts with the documents API PR. |
api/v1_storage.inc
Outdated
if ($method == "GET") { | ||
return json_decode($storage->get()); | ||
} elseif ($method == "PUT") { | ||
$storage->set(json_encode(api_get_request_body())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case it is already encoded by the client so we don't want to encode it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, given the existing API structure, we do want to encode it twice when returning the string to the client. The client encodes its javascript object into a JSON string and expects to get back the same string.
No, that's not how it works.
The Client isn't sending us an opaque string, it's sending us a JSON object. That comes through as a string and the ApiRouter decodes that string into an object -- it does this for all request bodies and it's how all of the functions work. Our ApiClientStorage get()
and set()
s take strings, so we have to encode that back into a string to store it.
Client sends JSON in request -> API converts to object -> v1_storage.inc
converts back to string -> ApiClientStorage persists string
When we get()
it from ApiClientStorage the class returns a string. But returns from API router functions needs to be an object, because the API will encode that into a string, so we need to decode it into an object first.
ApiClientStorage returns string -> v1_storage.inc
converts back to object -> API converts object to string -> Client receives JSON in return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, given the existing API structure, we do want to encode it twice when returning the string to the client. The client encodes its javascript object into a JSON string and expects to get back the same string.
No, that's not it works.
The Client isn't sending us an opaque string, it's sending us a JSON object. That comes through as a string and the ApiRouter decodes that string into an object -- it does this for all request bodies and it how all of the functions work. Our ApiClientStorage
get()
andset()
s take strings, so we have to encode that back into a string to store it.Client sends JSON in request -> API converts to object ->
v1_storage.inc
converts back to string -> ApiClientStorage persists stringWhen we
get()
it from ApiClientStorage the class returns a string. But returns from API router functions needs to be an object, because the API will encode that into a string, so we need to decode it into an object first.ApiClientStorage returns string ->
v1_storage.inc
converts back to object -> API converts object to string -> Client receives JSON in return
I see why we are at cross purposes; you are envisaging the client sending a javascript object whereas I was assuming the client would encode it first and send a JSON object. Sorry that's not what you said but in my scenario the client encodes the javascipt object into a JSON object and it gets doubly encoded in transport but decoded back into a JSON object by api_get_request_body() so we can put it straight into the database.
p.s. I think the api_get_request_body() is a bit confusing because when it says return $json, the thing it is returning generally isn't JSON but what was originally provided as data to the ajax function. (though in my scenario it now is).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.s. I think the api_get_request_body() is a bit confusing because when it says return $json, the thing it is returning generally isn't JSON but what was originally provided as data to the ajax function. (though in my scenario it now is).
Yes, that's totally fair -- api_get_request_body()
returns a decoded object/array, not a string - $json_object
would be clearer.
And yes, the client is encoding the object into JSON -- we do that already in the ajax function:
if (upperCaseMethod !== "GET") {
// POST or PUT
options.headers["Content-Type"] = "application/json";
options.body = JSON.stringify(data);
}
The client just needs to pass in the object it wants persisted as $data
to ajax()
and it'll all just work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. But I think there could be some problems with that approach. The original javascript object will get encoded into JSON and will get decoded into a PHP object and then re-encoded into JSON and the reverse on the way back. PHP objects don't correspond exactly to javascript objects. As we have seen, a php associative array which happens to have consecutive numerical keys or is empty is encoded as an ordinary array. What we get back may not be the same as what we started with.
The other approach of having the client make a JSON object with JSONstringify and then sending that (it would get doubly encoded) would, I imagine, be safer since we are only json-encoding and -decoding strings in the server. v1_storage would then not need to do any encoding or decoding but the client would have to encode and decode so the number of codings would end up much the same overall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I don't want to do that. When we save a page from the PI we send a string from the client and the same string gets put into the db. What I'm proposing is sending a JSON string in the same way and this same JSON string gets put into the db as JSON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The page is a string inside a JSON object -- it's not a naked string. So what you're proposing is having a JSON string as a value of a JSON object:
{
"data": "{\"key\": \"value\"}"
}
And then the API would extract data
out of the payload and pass the string into the ApiClientStorage class.
That's a weird and unexpected interface. No one does that -- just pass the JSON object 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's getting late here now. I'll consider it later but have a happy Christmas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've put up a commit that allows a routed function to get the non-deserialized input and output non-serialized output. This is less hacky than I thought it would be. It's in a separate commit for evaluation / discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Essentially yes but without the wrapping. However I like your idea of bypassing the encoding and decoding in the lower levels of the API much better.
api/v1_storage.inc
Outdated
|
||
$storage = new ApiClientStorage($clientid, $pguser); | ||
if ($method == "GET") { | ||
return json_decode($storage->get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I'm still concerned about this. As I've discovered elsewhere, converting between javascript objects and arrays and php arrays and associative arrays via JSON can cause some screw-ups. Let me think about this some more.
In fact, given the existing API structure, we do want to encode it twice when returning the string to the client. The client encodes its javascript object into a JSON string and expects to get back the same string. The string is sent to the server and stored unchanged in the database and when it gets sent back it is encoded again, as you said, by the json_encode in index.php and then decoded by the response.json() in api.js. The client has to decode it again to recover the original javascript object.
So we don't want to do any encoding or decoding in v1_storage.inc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. I think it could be slightly simplified.
@@ -73,7 +76,35 @@ class ApiRouter | |||
throw new InvalidAPI(); | |||
} | |||
|
|||
public static function get_router() | |||
public function request(bool $raw = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function could stay in index.php
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this here to keep the encoding and decoding together. And because I had to move the encoding here this came along with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems reasonable. I've had some further thoughts about this and deleted my other ananswered comments which had overlooked a change you made in index.php. It's difficult to see the whole picture in this github view so I dowloaded the files to look at them more carefully.
} | ||
} | ||
|
||
public function response(bool $raw = false): string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We wouldn't need this if it was incorporated as above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I wanted to keep the encoding and decoding together and they got moved in here, breaking them out into clear functions makes it clearer what's going on IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for delay in the reply and merry Christmas! 🎄
To enable this feature, add a string for the client to the | ||
`_API_CLIENT_STORAGE_KEYS` configuration setting and have the client use that | ||
string with the endpoint as the `clientid`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The clientid means that each user can have only at most len(API_CLIENT_STORAGE_KEYS) JSON blobs and those have a maximum size.
I am not seeing any validation on the number of stored blobs, we only check that the client id is allow-listed but besides that the constraints on the DB would allow unbounded storage. Those are trusted clients so it doesn't seem to be a concern, just curious about this.
Perhaps "client" is the wrong word for this? I am using it in the "JS client that uses the APIs". Would "app" or some other word be clearer?
Mhh, is the idea to separate our browser users from other "app" that programmatically call our API?
You have the gist of it, but it's not temporary.
OK, as a side question since we need a logged in users to store the information, would it make sense to use the user rather than an extra key? Or are we thinking that some "apps" will backfill/mutate a lot of users' keys? I am probably missing something rather obvious here as I don't have much context on the new UI.
description: JSON blob that was persisted | ||
content: | ||
application/json: | ||
schema: | ||
type: object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timestamp is arguably just as useless as the JSON blob because it's always now().
Sure as it represents the last modified timestamp from the server's perspective.
However I do think there is a lot more value to returning this compared to the blob that was sent and is known to the client. The timestamp would allow the client to implement some caching (e.g. don't check with the server if the timestamp is fresh enough), lower the bandwidth needed (by return the JSON object on a get
request if the timestamp doesn't match, useful for mobile devices) or potentially even allow for delta (this would need to switch to some event based logic). Arguably, neither of those scenarios applies here, but those are enabled by returning the timestamp 😄
Open to ideas / thoughts on what API clients should "expect" from a successful PUT / PATCH / etc beyond the HTTP status code.
I don't think we need to return anything from the API on success: the return code says if it was successful and we can return an optional error field to return the cause of errors to the client. Those 2 are what a minimal client would expect. We can always add more as we understand our needs more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that to receive and return raw data we should really intervene in index.php in the functions api_get_request_body() and api_output_response() rather than make any changes in ApiRouter.
The function api_get_request_body() is easy by adding a 'raw' parameter as you have done but api_output_response() is more difficult. We could set a global variable to tell api_output_response() to send raw data but adding more global variables isn't something we want to do. Perhaps make a static API class to put the variable in. Would this be possible?
That's exactly why I put it in ApiRouter -- to avoid a global variable or adding a new artificial class with static variables. To me it made sense to put it in the router because the router is accepting a request and returning a response. |
Fair enough. |
@@ -55,15 +57,16 @@ class ApiRouter | |||
throw new MethodNotAllowed(); | |||
} | |||
$function = $url_map["endpoint"][$method]; | |||
return $function($method, $data, $query_params); | |||
$this->_response = $function($method, $data, $query_params); | |||
return $this->_response; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need this line now?
I'm really not doing a good job of describing this which means that I've named things incredibly poorly in the code and I need to figure out how to fix this, my apologies. Let me zoom out just a little: to use the DP API you must be authenticated with an API key (or a valid PHP session which is a valid proxy) -- we do not allow unauthenticated access. So every request to the API is mapped to a user / username. We expect the API to be used by two primary consumers:
The endpoint in question is geared to enabling the new proofreading interface (that is: client) with the ability to store arbitrary (but well-structured) configuration information to persist across different browsers and devices. Fundamentally we want to allow a client to store one and only one blob per user. Because all endpoints are authenticated when they PUT to We have an allow-list for valid We can't rely on So in summary, right now the maximum number of blobs that could be stored in this interface is |
I'm going to do some slight renaming of some things in this to remove |
We want the new proofreading interface to be able to persist client configuration details across browsers and devices. This requires that some configuration values are stored server-side. To enable this, allow clients to store opaque JSON blobs on the server, up to one blob per client per user.
Please read the updates to
SETUP/API.md
on important caveats about this endpoint and ensure we're OK with this from a server-side perspective as well as the client side.This PR introduces a new generalized JsonStorage class and database table with an ApiClientStorage layer on top of that wired into the API. When JSON blobs are updated in the JsonStorage class their timestamp field in the database is also updated. This field isn't currently made available via the code right now but seems like it would be useful if, for instance, we want to clean up client blobs older than some period of time. I'm open to thoughts on this.
curl
examples to test this:This is likely to have minor merge conflicts with #1398 and I'll resolve those after that gets merged.